UKParl: A Data Set for Topic Detection with Semantically Annotated Text

نویسندگان

  • Federico Nanni
  • Mahmoud Osman
  • Yi-Ru Cheng
  • Simone Paolo Ponzetto
  • Laura Dietz
چکیده

We present a dataset created from the Hansard House of Commons archived debates of the UK parliament (2013-2016). The resource includes fine-grained topic annotations at the document level and is enriched with additional semantic information such as the one provided by entity links. We assess the quality and usefulness of this corpus with two benchmarks on topic classification and ranking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Lexical Topic Models to Protein Interaction Sentence Prediction

Topic models can be used to improve classification of protein-protein interactions (PPIs) by condensing lexical knowledge available in unannotated biomedical text into a semantically-informed kernel smoothing matrix. Detection of sentences that describe PPIs is difficult due to lack of annotated data. Furthermore, sentences generally contain a small percentage of the features, thus leading to s...

متن کامل

mark Alan Finlayson inferring Propp ’ s Functions from Semantically Annotated text

Vladimir Propp’s morphology of the Folktale is a seminal work in folkloristics and a compelling subject of computational study. I demonstrate a technique for learning Propp’s functions from semantically annotated text. Fifteen folktales from Propp’s corpus were annotated for semantic roles, co-reference, temporal structure, event sentiment, and dramatis personae. I derived a set of merge rules ...

متن کامل

Topic Models for Semantically Annotated Document Collections

Increasingly, web document collections such as PubMed and DBPedia, but also social bookmarking systems, are annotated with semantic meta data. Given that the number of semantically annotated document collections is expected to increase in the near future, it is of interest to analyze if topic models might be able to play a larger role. Since most of the time, annotations are noisy and even huma...

متن کامل

Using Topic Modeling and Similarity Thresholds to Detect Events

This paper presents a Retrospective Event Detection algorithm, called Eventy-Topic Detection (ETD), which automatically generates topics that describe events in a large, temporal text corpus. Our approach leverages the structure of the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA), to generate topics which are then later labeled as Eventy-Topics or non-Eventy-Topi...

متن کامل

LogicalFormBanks, the Next Generation of Semantically Annotated Corpora: key issues in construction methodology

The next generation of semantically annotated corpora will move a step further from raw text to meaning representation. The information to be encoded will go beyond the phrase-level information stored in PropBanks and represent sentencelevel semantic information. In this paper I address issues that call to be explicitly articulated concerning the construction methodology of corpora annotated wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018